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Abstract 

On ■ 

q , We connect the power of Confidence Intervals in different Frequentist meth- 

O \ ods to their reliability. We show that in the case of a bounded parameter a 

biased method which near the boundary has large power in testing the pa- 
rameter against larger alternatives and small power in testing the parameter 
y—^ , against smaller alternatives is desirable. Considering the recently proposed 

q ' methods with correct coverage, we show that the Maximum Likelihood Esti- 

CN ■ mator method [jl|, ||] has optimal bias. 

\ It is well known that the most important property of Frequentist Confidence Intervals is coverage: 

a 100(1 — a)% Confidence Interval belong to a set of intervals that cover the true value of the measured 
quantity p with Frequentist probability 1 — a. Neyman's method obtains Confidence Intervals with 
correct coverage through the construction for each possible value of p of an acceptance interval with 
probability 1 — a for an estimator p of p. The union of all acceptance intervals in the p-p plane is called 
the Confidence Belt. The Confidence Interval for p resulting from a measurement /x Q b s of the estimator 

! is the set of all values of p whose acceptance interval for p include /x bs- 

Coverage is not the only property of Confidence Intervals, because many methods for the con- 
Q_i| struction of a Confidence Belt with exact coverage are available (see Refs. [|, £§ g]). These methods 

differ by power a quantity which is obtained considering the construction of acceptance intervals as 
hypothesis testing. Coverage and power are connected, respectively, with the so-called Type I and Type 
II errors in testing a simple statistical hypothesis Hq against a simple alternative hypothesis Hi (see 
X 1 Ref. [|], section 20.9): 

Type I error: Reject the null hypothesis Hq when it is true. The probability of a Type I error is called 

size of the test and it is usually denoted by a. 
Type II error: Accept the null hypothesis Hq when the alternative hypothesis Hi is true. The probabil- 
ity of a Type II error is usually denoted by j3. The power of a test is the probability it = 1 — j3 to 
reject Hq if Hi is true. A test is Most Powerful if its power is the largest one among all possible 
tests. This is clearly the best choice. 

Unfortunately, the power associated with a confidence belt is not easy to evaluate, because for 
each possible value po of p considered as a null hypothesis there is no simple alternative hypothesis that 
allows to calculate the probability f3 of a Type II error. Instead, we have the alternative hypothesis Hi : 
Ml 7^ Mo* which is composite. For each value of p\ ^ pq one can calculate the probability P^ (pi) of 
a Type II error associated with a given acceptance interval corresponding to pq. A method that gives an 
acceptance region for pq which has the largest possible power 7r Mo (pi) = 1 — /? Mo (pi) is Most Powerful 
with respect to the alternative pi. Clearly, it would be desirable to find a Uniformly Most Powerful 
test, i.e. a test that gives an acceptance region for which has the largest possible power 7r /i0 (//i) for 
any value of p\. Unfortunately, the Neyman-Pearson lemma implies that in general a Uniformly Most 
Powerful test does not exist if the alternative hypothesis is two-sided, i.e. both pi < pq and pi > po are 
possible, and the derivative of the Likelihood with respect to p is continuous in po (see Ref. [|3|], section 
20.18). Nevertheless, it is possible to find a Uniformly Most Powerful test if the class of tests is restricted 




Fig. 1: A: Power n in the Central Intervals method for an estimator of fi that has a Gaussian distribution. B: Reliability of 
the Confidence Intervals obtained with the Central Intervals method for a bounded /x > 0. See text for details. 

in appropriate ways. A class of tests that has some merit is that of unbiased tests, such that the power 
ir^Q (m) for any value of ll\ is larger or equal to the size a of the test, 

7i> Ol)>a! for all m . (1) 

In other words, the probability of rejecting liq when it is false is at least as large as the probability of 
rejecting liq when it is true. The equal-tail test used in the Central Intervals method is unbiased and 
Uniformly Most Powerful Unbiased for distributions belonging to the exponential family, such as, for 
example, the Gaussian and Poisson distributions (see Ref. [j^], section 21.31). 

Therefore, the Central Intervals method is widely used because it corresponds to a Uniformly Most 
Powerful Unbiased test. Other methods based on asymmetric tests unavoidably introduce some bias. 

Figure [pA illustrates the power tt in the Central Intervals method for an estimator fx of fi that has 
a Gaussian distribution. The Gaussian distribution of fx for li = llq is depicted qualitatively above the 
horizontal line for li = llq. The 100(1 — a)% acceptance interval corresponding to the null hypothesis 
liq is limited by the two vertical lines. The area of the two dark-shaded tails of the distribution is equal 
to a. 

Let us consider for example the alternative hypothesis li\ > llq (similar considerations apply to 
the alternative hypothesis li\ < llq). The Gaussian distribution of fx for li = Lif is depicted qualitatively 
above the horizontal line for li = fif in Fig. [I|A. The probability f3 + of a Type II error in testing llq 
against ll\ is given by the integral of the distribution of fx for li = ll\ in the interval between the two 
horizontal lines. The corresponding area is shown dark-shaded in Fig. [I|A. The power to test the null 
hypothesis liq against the larger alternative fxf > liq, is given by the integral of the distribution of fx for 
fx = [if in the two semi-infinite intervals of fx external to the two horizontal lines. The corresponding 
areas are shown light-shaded in Fig. [pA (only the one on the right is large enough to be visible). 

From Fig. [T|A one can see that the power corresponding to alternative hypotheses ll^ and Lif , 
respectively smaller and larger than the null hypothesis liq, is equal. The Central Intervals method 
produces the most reliable results in the case of an unbounded fx, because the power is perfectly balanced. 
Problems arise if one considers the measurement of a bounded quantity li. As illustrated in Fig. [T|B for 
the case of a bounded ll > 0, the balanced power in the Central Intervals method is not appropriate. 
Indeed, a high power to test liq against ll{ < llq when is near the boundary is not needed, because the 
alternatives li± < [xq are limited. As a result, the Central Intervals method produces in this case clearly 
unreliable Confidence Intervals if the value of fx b s lies on the left-hand side of Fig. [pB. Sometimes 




Fig. 3: A: Power n in the Unified Approach for an estimator p, of [i > that has a Gaussian distribution. B: Reliability. 



the Confidence Interval can be empty, giving no information. Sometimes one can get a very stringent 
upper limit, much smaller than the exclusion potential of the experiment [Q, ^]. This possibility is very 
dangerous, because it can lead to wrong conclusions if interpreted in inappropriate ways. In any case it 
gives no useful information on the value of \i. 

In the past the Upper Limits method was rather popular. Figures ||]A and ||]B show that the Upper 
Limits method is actually worse than the Central Intervals method because it is biased in the wrong 
direction. As a consequence, it produces limits that are practically always unreliable, except maybe 
when by chance fl ^ s ~ 0. 

The method biased in the right direction that has been proposed first is the Unified Approach of 
Feldman and Cousins [Q], which, as illustrated in Fig. ||A, gives more power to test /j,q against ^ > fiQ 
than to test against /jl^ < fiQ when fiQ is near the boundary. However, the bias is still insufficient 
to produce reliable results if ju G b s <C 0: from Fig. ||B one can see that when /2 b s <C the Confidence 
Interval gives an upper limit for \i that is unphysically too small [Q, p3, |2), 0, 10], much smaller than the 
exclusion potential of the experiment [Q, [7|]. 

Figure |]A illustrates the calculation of the power in the Maximum Likelihood Estimator method 
proposed independently by Ciampolillo in Ref. [jl|] and Mandelkern and Schultz in Ref. In this 
method the estimator of /i is not ju, but the maximum likelihood value \i* of /i. Since the range of fi* is 
equal to the range of fi, the estimate ^* bs always lies in the physical range of \x. In the case of a Gaussian 




Fig. 4: A: Power n in the Maximum Likelihood method for an estimator /2 of /i > that has a Gaussian distribution. B: 
Reliability. 



distribution for ju illustrated in Fig. |]A, fi* = fi for /I > and fi* = for jl < 0. Therefore, as shown in 
Fig. |]A, the upper limit for fi obtained for any /2 b s < is equal to the upper limit obtained for /2 b s = 0. 

As one can see from Fig. ||A, the Maximum Likelihood Estimator method has optimal bias. As a 
consequence, this method produces reliable results for any value of /2 bs> as shown in Fig. |]B. 

Let us emphasize that the bias is needed near the boundary and both the Maximum Likelihood 
Estimator method and the Unified Approach produce Confidence Intervals that practically coincide with 
those obtained with the Central Intervals method when /x b s S> 0. 

In conclusion, we have shown that the Maximum Likelihood Estimator method [|], |2|] have optimal 
power in the case of measurement of a bounded quantity and produces always reliable Confidence Inter- 
vals. For these reasons, it should be preferred over the Unified Approach [Q], which is however better 
than the Central Intervals method. Worse of all is the method of Upper Limits. 
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